Goto

Collaborating Authors

 reinforcement learner


Paper: Generalization of Reinforcement Learners with Working and Episodic Memory

Neural Information Processing Systems

We thank the reviewers for their thoughtful and constructive feedback on our manuscript. This should help both contextualize each task's difficulty and illustrate what it involves. Reviewer 3 noted the Section 2 task descriptions could be better presented. We have reformatted it so that "the order We also changed our description of IMP ALA to match Reviewer 5's suggestion. Regarding the task suite, Reviewer 4 raised a thoughtful consideration on whether "most of the findings translate when Some 3D tasks in the suite already have '2D-like' semi-counterparts that do not require navigation, '2D-like' because everything is fully observable and the agent has a first-person point of view from a fixed point, without Spot the Difference level, was overall harder than Change Detection for our ablation models.


Paper: Generalization of Reinforcement Learners with Working and Episodic Memory

Neural Information Processing Systems

We thank the reviewers for their thoughtful and constructive feedback on our manuscript. This should help both contextualize each task's difficulty and illustrate what it involves. Reviewer 3 noted the Section 2 task descriptions could be better presented. We have reformatted it so that "the order We also changed our description of IMP ALA to match Reviewer 5's suggestion. Regarding the task suite, Reviewer 4 raised a thoughtful consideration on whether "most of the findings translate when Some 3D tasks in the suite already have '2D-like' semi-counterparts that do not require navigation, '2D-like' because everything is fully observable and the agent has a first-person point of view from a fixed point, without Spot the Difference level, was overall harder than Change Detection for our ablation models.


KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping

Boyalakuntla, Kowndinya, Boularias, Abdeslam, Yu, Jingjin

arXiv.org Artificial Intelligence

-- We present Kalman-filter Assisted Reinforcement Learner (KARL) for dynamic object tracking and grasping over eye-on-hand (EoH) systems, significantly expanding such systems' capabilities in challenging, realistic environments. In comparison to the previous state-of-the-art, KARL (1) incorporates a novel six-stage RL curriculum that doubles the system's motion range, thereby greatly enhancing the system's grasping performance, (2) integrates a robust Kalman filter layer between the perception and reinforcement learning (RL) control modules, enabling the system to maintain an uncertain but continuous 6D pose estimate even when the target object temporarily exits the camera's field-of-view or undergoes rapid, unpredictable motion, and (3) introduces mechanisms to allow retries to gracefully recover from unavoidable policy execution failures. Extensive evaluations conducted in both simulation and real-world experiments qualitatively and quantitatively corroborate KARL's advantage over earlier systems, achieving higher grasp success rates and faster robot execution speed. Source code and supplementary materials for KARL will be made available at: https://github.com/arc-l/karl . Humans, and animals in general, interact with the physical world through observing and handling everyday objects [1], which makes object tracking and manipulation arguably the most fundamental skill for physical intelligence. In robotics, autonomous grasping in stationary settings has been extensively studied [2], [3], typically using decoupled vision and manipulation sub-systems where the camera does not move with the manipulator. While effective for static tasks, this approach struggles in dynamic scenarios where objects move or become occluded. Real-world interactions, such as handovers, require continuous tracking and adaptive grasping, highlighting the need for more integrated solutions.


Reviews: Generalization of Reinforcement Learners with Working and Episodic Memory

Neural Information Processing Systems

The authors do a good job of motivating their work, and they contribute a nice experimental section with good results. The ablation study was thorough. Well done! --- Many tasks that might be given to an RL agent are impossible without working memory. This paper presents a suite of tasks which require use of that memory in order to succeed. These tasks are compiled from a variety of other sources, either directly or re-implemented for this suite.



Generalization of Reinforcement Learners with Working and Episodic Memory

Neural Information Processing Systems

Memory is an important aspect of intelligence and plays a role in many deep reinforcement learning models. However, little progress has been made in understanding when specific memory systems help more than others and how well they generalize. The field also has yet to see a prevalent consistent and rigorous approach for evaluating agent performance on holdout data. In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization. To that end, we first construct a diverse set of memory tasks that allow us to evaluate test-time generalization across multiple dimensions. Second, we develop and perform multiple ablations on an agent architecture that combines multiple memory systems, observe its baseline models, and investigate its performance against the task suite.


Implementation of a Model of the Cortex Basal Ganglia Loop

Arakawa, Naoya

arXiv.org Artificial Intelligence

This article presents a simple model of the cortex-basal ganglia-thalamus loop, which is thought to serve for action selection and executions, and reports the results of its implementation. The model is based on the hypothesis that the cerebral cortex predicts actions, while the basal ganglia use reinforcement learning to decide whether to perform the actions predicted by the cortex. The implementation is intended to be used as a component of models of the brain consisting of cortical regions or brain-inspired cognitive architectures.


EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation

Huang, Baichuan, Yu, Jingjin, Jain, Siddarth

arXiv.org Artificial Intelligence

In this paper, we explore the dynamic grasping of moving objects through active pose tracking and reinforcement learning for hand-eye coordination systems. Most existing vision-based robotic grasping methods implicitly assume target objects are stationary or moving predictably. Performing grasping of unpredictably moving objects presents a unique set of challenges. For example, a pre-computed robust grasp can become unreachable or unstable as the target object moves, and motion planning must also be adaptive. In this work, we present a new approach, Eye-on-hAnd Reinforcement Learner (EARL), for enabling coupled Eye-on-Hand (EoH) robotic manipulation systems to perform real-time active pose tracking and dynamic grasping of novel objects without explicit motion prediction. EARL readily addresses many thorny issues in automated hand-eye coordination, including fast-tracking of 6D object pose from vision, learning control policy for a robotic arm to track a moving object while keeping the object in the camera's field of view, and performing dynamic grasping. We demonstrate the effectiveness of our approach in extensive experiments validated on multiple commercial robotic arms in both simulations and complex real-world tasks.


Demanding and Designing Aligned Cognitive Architectures

Holtman, Koen

arXiv.org Artificial Intelligence

With AI systems becoming more powerful and pervasive, there is increasing debate about keeping their actions aligned with the broader goals and needs of humanity. This multi-disciplinary and multi-stakeholder debate must resolve many issues, here we examine three of them. The first issue is to clarify what demands stakeholders might usefully make on the designers of AI systems, useful because the technology exists to implement them. We make this technical topic more accessible by using the framing of cognitive architectures. The second issue is to move beyond an analytical framing that treats useful intelligence as being reward maximization only. To support this move, we define several AI cognitive architectures that combine reward maximization with other technical elements designed to improve alignment. The third issue is how stakeholders should calibrate their interactions with modern machine learning researchers. We consider how current fashions in machine learning create a narrative pull that participants in technical and policy discussions should be aware of, so that they can compensate for it. We identify several technically tractable but currently unfashionable options for improving AI alignment.


2022 Trends in Data Science: Newfound Ease and Accessibility - insideBIGDATA

#artificialintelligence

Despite the obvious impact of the most salient macro level trends impacting data science--including Artificial Intelligence, cloud computing, and the Internet of Things--the ends of this discipline remain largely unchanged from when it initially emerged nearly 10 years ago. The goal has always been to equip the enterprise with tailored solutions spanning technological approaches that not only justify, but also maximize the use of data for fulfilling the most meaningful business objectives at hand. Oftentimes, those involve the upper end of the analytics continuum in the form of predictive and prescriptive measures. Currently, cognitive computing deployments factor substantially into data scientists' abilities to complete this task. Ergo, the most profound developments affecting this space in 2022 reduce the traditional impediments to devising the underlying models that support applications of Natural Language Processing, cognitive search, image recognition, and other advanced analytics manifestations.